Effects of Sampling on Twitter Trend Detection
نویسندگان
چکیده
Much research has focused on detecting trends on Twitter, including health-related trends such as mentions of Influenza-like illnesses or their symptoms. The majority of this research has been conducted using Twitter’s public feed, which includes only about 1% of all public tweets. It is unclear if, when, and how using Twitter’s 1% feed has affected the evaluation of trend detection methods. In this work we use a larger feed to investigate the effects of sampling on Twitter trend detection. We focus on using health-related trends to estimate the prevalence of Influenza-like illnesses based on tweets, and use ground truth obtained from the CDC and Google Flu Trends to explore how the prevalence estimates degrade when moving from a 100% to a 1% sample. We find that using the public 1% sample is unlikely to substantially harm ILI estimates made at the national level, but can cause poor performance when estimates are made at the city level.
منابع مشابه
Entropy-based Classification of 'Retweeting' Activity on Twitter
Twitter is used for a variety of reasons, including information dissemination, marketing, political organizing and to spread propaganda, spamming, promotion, conversations, and so on. Characterizing these activities and categorizing associated user generated content is a challenging task. We present a information-theoretic approach to classification of user activity on Twitter. We focus on twee...
متن کاملDetection of Twitter Users' Attitudes about Flu Vaccine based on the Content and Sentiment Analysis of the Sent Tweets
Introduction: The influenza vaccine is one of the controversial challenges in today's societies. Considering the importance of using the flu vaccine in preventing the spread of influenza virus, the Twitter network, as a rich source of data, provides suitable conditions for research in this field to examine the attitudes of different people about this vaccine. The results in one hand will help h...
متن کاملA High-Performance Model based on Ensembles for Twitter Sentiment Classification
Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...
متن کاملDetection of Twitter Users' Attitudes about Flu Vaccine based on the Content and Sentiment Analysis of the Sent Tweets
Introduction: The influenza vaccine is one of the controversial challenges in today's societies. Considering the importance of using the flu vaccine in preventing the spread of influenza virus, the Twitter network, as a rich source of data, provides suitable conditions for research in this field to examine the attitudes of different people about this vaccine. The results in one hand will help h...
متن کاملSentiment Analysis and Trend Detection in Twitter
Social networks such as Twitter are considered a rich resource of information about actual world actions of all types. Several efforts have been dedicated to trend detection on Twitter i.e., the current popular topics of conver‐ sation among its users. However, despite these efforts, sentiment analysis is not taken into account. Sentiment analysis is the field of study that analyzes people’s op...
متن کامل